Research Project Report: Flood Detection and Mapping Using Sentinel-1 and CHIRPS Data¶
Introduction¶
Background¶
Flooding is one of the most devastating natural disasters, impacting lives, infrastructure, and the environment. With the increasing frequency of extreme weather events due to climate change, effective flood monitoring and mapping are essential. Remote sensing, particularly SAR data, provides a reliable method for flood detection under all weather conditions, even during cloud cover.
Sentinel-1, with its SAR capabilities, and CHIRPS (Climate Hazards Group InfraRed Precipitation with Stations) rainfall data are pivotal in this regard. Sentinel-1 detects surface water changes through backscatter intensity variations, while CHIRPS provides auxiliary data on rainfall distribution.
Aim and Research Objectives¶
The aim of this research is to develop a flood detection and mapping methodology using Sentinel-1 SAR data and CHIRPS rainfall data. Specific objectives include:
- Preprocessing Sentinel-1 and CHIRPS data for multiple time periods (e.g., October 2022 and April 2018).
- Developing a flood detection model using machine learning.
- Validating flood detection results and producing a probabilistic flood map for both temporal snapshots.
Study Area Overview¶
The study focuses on a region in Benue State in Nigeria, which experiences seasonal flooding exacerbated by heavy rainfall. This area was chosen due to its history of flood events and the availability of data for multiple time periods, enabling a comparison between conditions in October 2022 and April 2018.
Data Selection Rationale¶
Sentinel-1¶
High-resolution SAR data capable of detecting water changes during and after flooding events. Two datasets were used:
- October 2022 Sentinel-1 data for an initial flood mapping exercise.
- April 2018 Sentinel-1 data for replicating and validating the methodology for a different historical flood event.
CHIRPS¶
Rainfall data providing information on precipitation levels to correlate with flood occurrences. Similarly, CHIRPS data from both October 2022 and April 2018 were used to compare rainfall patterns and their relationship with detected flooding.
Methods¶
Data Sources¶
Remote Sensing Data:¶
Sentinel-1 GRD VV polarization imagery for two periods:
- October 2022
- April 2018
CHIRPS rainfall data for the corresponding months (October 2022 and April 2018).
Auxiliary Data:¶
- Permanent water mask to exclude non-flooded water bodies.
Data Preprocessing¶
Sentinel-1:
- Conversion of Digital Numbers (DN) to Sigma0 in decibels (dB).
- Speckle filtering using a median filter.
- Thresholding to identify potential flooded areas.
- Alignment and rotation transforms applied to match geographical extents.
- Exclusion of permanent water bodies using a reprojected permanent water mask.
CHIRPS:
- Resampling to match Sentinel-1 resolution and extent for each time period.
Generating Flood Masks:
- Combining Sentinel-1 flood detection and permanent water mask to isolate flood-only pixels.
Data Analysis¶
Feature Engineering:
- Stacking Sentinel-1 backscatter (Sigma0) and CHIRPS rainfall data for each date to create a feature matrix.
Machine Learning (Random Forest Classifier):
- Addressing class imbalance using SMOTE.
- Training the model on labeled data (from one time period or both, depending on the research design).
- Generating flood probabilities for every pixel.
Batch Prediction for Memory Efficiency:
- Due to memory constraints when handling large datasets, predictions were performed in batches rather than all at once.
- This technique ensured that the computation would not run out of memory when predicting probabilities for millions of pixels.
Validation:¶
- Model performance evaluated using accuracy, classification reports, and ROC curves for both time periods (April 2018 and October 2022).
Study Period Comparisons¶
By applying the same methodology to both October 2022 and April 2018 data, temporal comparison is possible. Differences in flood extent, probability, and correlation with rainfall patterns can be assessed.
import math
import rasterio
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import median_filter
from skimage.filters import threshold_otsu
from rasterio.warp import reproject, Resampling
from rasterio.crs import CRS
from rasterio.transform import from_bounds, Affine
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from imblearn.over_sampling import SMOTE
from skimage.transform import resize
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc, accuracy_score
from sklearn.utils.class_weight import compute_class_weight
import seaborn as sns
import matplotlib
from matplotlib.patches import Patch
from itertools import cycle
Loading Sentinel-1 VV Band Data¶
This cell opens the specified Sentinel-1 VV band GeoTIFF file and reads it as a NumPy array. It then prints basic information such as the shape, data type, and the min/max pixel values to understand the raw backscatter data range.
# Path to your Sentinel-1 VV band GeoTIFF file
vv_band_path = r'S:\471-688T\Fall24\uokoron1\S1A_IW_GRDH_1SDV_20221008T173754_20221008T173819_045354_056C37_72C6.SAFE\measurement\s1a-iw-grd-vv-20221008t173754-20221008t173819-045354-056c37-001.tiff'
# Open the VV band using rasterio
with rasterio.open(vv_band_path) as src:
vv_band = src.read(1) # Read the first (and only) band
print("VV Band Shape:", vv_band.shape)
print("VV Band Data Type:", vv_band.dtype)
print("Minimum pixel value:", np.min(vv_band))
print("Maximum pixel value:", np.max(vv_band))
VV Band Shape: (16786, 25218) VV Band Data Type: uint16 Minimum pixel value: 0 Maximum pixel value: 15473
# Define a contrast stretch to improve visibility
vmin, vmax = np.percentile(vv_band, (2, 98))
# Plot the image with adjusted contrast
plt.figure(figsize=(10, 10))
plt.imshow(vv_band, cmap='gray', vmin=vmin, vmax=vmax)
plt.colorbar(label="Backscatter Intensity")
plt.title("Sentinel-1 VV Band - Contrast Stretched")
plt.show()
Converting DN to Sigma0 (dB) and Visualization¶
Here, the Digital Number (DN) values are converted into backscatter intensity (Sigma0) in decibels (dB). A contrast stretch is applied again for better visualization. This provides a more meaningful representation of the SAR backscatter.
Contrast Stretching and Visualization of Raw Sentinel-1 Data¶
This cell also applies a contrast stretch to the raw Sentinel-1 VV data to enhance visibility. It then displays the image using a grayscale colormap, helping to identify any features more easily.
# Convert DN values to Sigma0 in dB
sigma0_dB = 10 * np.log10(vv_band**2)
# Apply a contrast stretch to the calibrated data
vmin, vmax = np.percentile(sigma0_dB, (2, 98))
# Display the calibrated and contrast-stretched image
plt.figure(figsize=(10, 10))
plt.imshow(sigma0_dB, cmap='gray', vmin=vmin, vmax=vmax, origin='upper')
plt.colorbar(label="Backscatter Intensity (dB)")
plt.title("Calibrated Sentinel-1 VV Band (Sigma0 in dB)")
plt.show()
C:\Users\uokoron1\AppData\Local\Temp\ipykernel_8612\1464944600.py:2: RuntimeWarning: divide by zero encountered in log10 sigma0_dB = 10 * np.log10(vv_band**2)
Speckle Filtering and Initial Flood Mask Creation¶
Speckle noise is common in SAR images. A median filter is applied to reduce this noise. After filtering, a simple threshold is used to generate an initial flood mask. Pixels below the threshold are considered flooded, providing a first, rough classification of flooded vs. non-flooded areas.
# Apply median filter with a 3x3 window
filtered_vv = median_filter(sigma0_dB, size=3)
# Display the filtered image
plt.figure(figsize=(10, 10))
plt.imshow(filtered_vv, cmap='gray', origin='upper')
plt.colorbar()
plt.title("Speckle Filtered Sentinel-1 VV Band")
plt.show()
thresh = 35
flood_mask = filtered_vv < thresh
# Display the flood mask
plt.figure(figsize=(10, 10))
plt.imshow(flood_mask, cmap='Blues')
plt.colorbar(label="Flooded Area (1) / Non-Flooded Area (0)")
plt.title("Initial Flood Mask from Sentinel-1 VV Band")
plt.show()
Integrating Permanent Water Mask and Refining the Flood Mask¶
- Cell 6: Loads the permanent water mask, which identifies areas that are always water and should not be classified as flood.
- Cell 7: Georeferences and reprojects the permanent water mask to align with the Sentinel-1 flood mask. Rotation and coordinate transformations are applied so both datasets match spatially.
- Cell 8: Combines the masks by removing permanent water bodies from the initial flood mask. This step refines our flood detection by focusing only on newly flooded areas and excluding long-standing water bodies.
# Path to the permanent water mask TIFF file
water_mask_path = r'S:\471-688T\Fall24\uokoron1\Copy of Benue_River_Occurrence.tif'
# Open the water mask file and read it as an array
with rasterio.open(water_mask_path) as src:
permanent_water_mask = src.read(1)
# Display the permanent water mask
plt.figure(figsize=(10, 10))
plt.imshow(np.flipud(permanent_water_mask), cmap='Blues')
plt.colorbar(label="Permanent Water (1) / Land (0)")
plt.title("Permanent Water Mask")
plt.show()
# Define rotation angle in degrees
rotation_angle = 11 # Adjust this value to control rotation amount
# Convert angle to radians
theta = math.radians(rotation_angle)
with rasterio.open(vv_band_path) as src:
flood_shape = src.shape
# Create transform with rotation
cos_theta = math.cos(theta)
sin_theta = math.sin(theta)
# Calculate the center point for rotation
center_lon = (7.91023875720884 + 10.45675839161536) / 2
center_lat = (7.125973782693571 + 9.09048554891996) / 2
# Create rotated transform
from rasterio.transform import Affine
flood_transform = Affine.translation(center_lon, center_lat) * \
Affine.rotation(rotation_angle) * \
Affine.translation(-center_lon, -center_lat) * \
from_bounds(
7.91023875720884 + 0.15,
7.125973782693571 + 0.20,
10.45675839161536 - 0.12,
9.09048554891996 - 0.20,
flood_shape[1],
flood_shape[0]
)
flood_crs = CRS.from_epsg(4326)
with rasterio.open(water_mask_path) as src:
water_transform = src.transform
water_crs = src.crs
permanent_water_mask = src.read(1)
# Create an empty array with the same shape as the flood mask
resampled_water = np.empty(flood_shape, dtype=np.uint8)
# Reproject the water mask to match the flood mask
reproject(
source=permanent_water_mask,
destination=resampled_water,
src_transform=water_transform,
src_crs=water_crs,
dst_transform=flood_transform,
dst_crs=flood_crs,
resampling=Resampling.nearest
)
# Plot the aligned masks
plt.figure(figsize=(10, 10))
plt.imshow(flood_mask, cmap='Blues')
plt.imshow(np.flipud(resampled_water), cmap='Reds', alpha=0.57)
plt.colorbar(label="Blue=Flood / Red=Permanent Water")
plt.title("Flood Mask with Permanent Water Mask Overlay")
plt.show()
# Create the water_bool mask that is True for any non-zero values
water_bool = resampled_water > 0 # True where water occurrence is greater than 0%
# Ensure flood_mask is a boolean array
flood_bool = flood_mask.astype(bool)
# Exclude all overlapping areas
flood_only_mask = np.logical_and(flood_bool, ~water_bool) # True where flood is True and water_bool is False
# Plot the result
plt.figure(figsize=(10, 10))
plt.imshow(flood_only_mask, cmap='Blues')
plt.colorbar(label="Flood Areas (Excluding Any Permanent Water Overlap)")
plt.title("Flood Mask Excluding All Overlapping Permanent Water Areas")
plt.show()
Loading and Visualizing CHIRPS Rainfall Data¶
This cell loads the CHIRPS total rainfall data (for October 2022) and displays it. Rainfall data will later serve as an additional feature to help understand and model flood occurrence, as heavy rainfall often correlates with flooding.
# Path to the CHIRPS rainfall GeoTIFF file
chirps_file = r'S:\471-688T\Fall24\uokoron1\CHIRPS_Total_Rainfall_October2022.tif'
# Open the CHIRPS file
with rasterio.open(chirps_file) as src:
chirps_data = src.read(1)
chirps_bounds = src.bounds
chirps_crs = src.crs
# Display CHIRPS data
plt.figure(figsize=(10, 8))
plt.imshow(chirps_data, cmap='Blues', extent=[
chirps_bounds.left, chirps_bounds.right,
chirps_bounds.bottom, chirps_bounds.top
])
plt.colorbar(label="Total Rainfall (mm)")
plt.title("CHIRPS Total Rainfall (October 2022)")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
Reprojecting CHIRPS Data and Overlaying with Sentinel-1¶
- Cell 10: Reprojects the CHIRPS rainfall data to match the coordinate reference system and spatial resolution of the Sentinel-1 data, ensuring both datasets align perfectly.
- Cell 11: Visualizes Sentinel-1 data with CHIRPS rainfall as an overlay. This combined visualization helps in interpreting how rainfall patterns relate to SAR backscatter and potentially flooded areas.
# Define the CRS for Sentinel-1 and CHIRPS
sentinel_crs = CRS.from_epsg(4326) # Assuming WGS 84 (EPSG:4326)
# Define the correct bounding box for Sentinel-1 from metadata
sentinel_geo_bounds = (7.910294680585994, 7.126125871563444, 10.45800359142737, 9.090711951180333) # (min_lon, min_lat, max_lon, max_lat)
with rasterio.open(vv_band_path) as sentinel_src:
# Assign correct transform and CRS to Sentinel-1
sentinel_transform = from_bounds(
sentinel_geo_bounds[0], sentinel_geo_bounds[1], # min_lon, min_lat
sentinel_geo_bounds[2], sentinel_geo_bounds[3], # max_lon, max_lat
sentinel_src.width, sentinel_src.height # width (cols), height (rows)
)
sentinel_shape = sentinel_src.shape
sentinel_data = sentinel_src.read(1) # Read Sentinel-1 data
# Open CHIRPS data
with rasterio.open(chirps_file) as chirps_src:
chirps_data = chirps_src.read(1)
chirps_transform = chirps_src.transform
chirps_crs = chirps_src.crs
# Prepare an empty array for resampled CHIRPS data
chirps_resampled = np.empty(sentinel_shape, dtype=chirps_data.dtype)
# Reproject CHIRPS to match Sentinel-1
reproject(
source=chirps_data,
destination=chirps_resampled,
src_transform=chirps_transform,
src_crs=chirps_crs,
dst_transform=sentinel_transform,
dst_crs=sentinel_crs,
resampling=Resampling.bilinear # Bilinear resampling for smoother data
)
# Plot Sentinel-1 with CHIRPS overlay
# Apply contrast stretching to Sentinel-1 data
vmin, vmax = np.percentile(sentinel_data, (2, 98)) # 2nd to 98th percentiles
plt.figure(figsize=(10, 8))
plt.imshow(sentinel_data, cmap='gray', vmin=vmin, vmax=vmax)
plt.colorbar(label="Backscatter Intensity")
plt.title("Sentinel-1 Data (Contrast-Stretched)")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
# Convert Sentinel-1 values to Sigma0 in decibels (dB)
sentinel_sigma0 = 10 * np.log10(sentinel_data + 1e-6) # Avoid log(0) error
# Plot the Sigma0 values
vmin, vmax = np.percentile(sentinel_sigma0, (2, 98)) # 2nd to 98th percentiles
plt.figure(figsize=(10, 8))
plt.imshow(sentinel_sigma0, cmap='gray', vmin=vmin, vmax=vmax)
plt.colorbar(label="Backscatter Intensity (dB)")
plt.title("Sentinel-1 Data (Sigma0 in dB)")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
plt.figure(figsize=(10, 8))
plt.imshow(sentinel_sigma0, cmap='gray', vmin=vmin, vmax=vmax, alpha=0.7, extent=[
sentinel_geo_bounds[0], sentinel_geo_bounds[2], # Longitude range
sentinel_geo_bounds[1], sentinel_geo_bounds[3] # Latitude range
])
plt.imshow(chirps_resampled, cmap='Blues', alpha=0.5, extent=[
sentinel_geo_bounds[0], sentinel_geo_bounds[2],
sentinel_geo_bounds[1], sentinel_geo_bounds[3]
])
plt.colorbar(label="Rainfall (mm)")
plt.title("Sentinel-1 with CHIRPS Rainfall Overlay")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
Confirming CRS and Shape of the Sentinel-1 Data¶
This cell prints the Coordinate Reference System (CRS) and shape of the Sentinel-1 data. This information is essential for ensuring that all datasets share the same spatial reference and dimensions before proceeding with modeling.
# Define the assumed CRS for Sentinel-1 data
assumed_crs = CRS.from_epsg(4326) # WGS 84
# Open the VV band using rasterio to get its shape and assign CRS
with rasterio.open(vv_band_path) as src:
vv_band = src.read(1)
flood_shape = vv_band.shape
flood_transform = src.transform
flood_crs = src.crs if src.crs is not None else assumed_crs
print("Sentinel-1 CRS:", flood_crs)
print("Sentinel-1 Shape:", flood_shape)
Sentinel-1 CRS: EPSG:4326 Sentinel-1 Shape: (16786, 25218)
Creating Labeled Flood Classes¶
This cell uses the refined flood mask and permanent water mask to create a labeled dataset:
- Label 0: No flood
- Label 1: Flooded area
- Label 2: Permanent water
These labels provide supervised information for the machine learning model.
# Create flood labels
flood_labels = np.zeros(flood_only_mask.shape, dtype=np.uint8)
# Label permanent water areas
flood_labels[resampled_water == 1] = 2 # Permanent water
# Label flooded areas
flood_labels[flood_only_mask] = 1 # Flooded areas
# Remaining areas will default to 0: No flood
Subsampling, Stacking Features, and Data Splitting¶
Due to large data size, the dataset is subsampled to reduce computational load. The Sentinel-1 data and CHIRPS rainfall data are combined into a feature matrix X, and labels y are extracted. Invalid data (NaNs) are removed, and the dataset is split into training and testing sets for model development and validation.
# First, subsample the data
subsample_factor = 10 # Only use every 10th pixel
sentinel_data = sentinel_data[::subsample_factor, ::subsample_factor]
chirps_resampled = chirps_resampled[::subsample_factor, ::subsample_factor]
flood_labels = flood_labels[::subsample_factor, ::subsample_factor]
# Convert to float32 to reduce memory usage
sentinel_data = sentinel_data.astype(np.float32)
chirps_resampled = chirps_resampled.astype(np.float32)
flood_labels = flood_labels.astype(np.uint8)
# Now stack the features (with reduced data size)
X = np.stack([sentinel_data.flatten(), chirps_resampled.flatten()], axis=1)
y = flood_labels.flatten()
# Remove invalid data
valid_mask = ~np.isnan(X).any(axis=1) & (y >= 0)
X = X[valid_mask]
y = y[valid_mask]
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Handling Class Imbalance and Training the Random Forest Model¶
Class imbalance is addressed using SMOTE (Synthetic Minority Oversampling Technique). A Random Forest classifier is trained on the resampled data. The performance is evaluated through a classification report, confusion matrix, and ROC curves, giving insight into the model’s predictive ability.
# Handle class imbalance with SMOTE
smote = SMOTE(sampling_strategy='auto', random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)
# Create and train Random Forest on resampled data
rf = RandomForestClassifier(
n_estimators=100,
random_state=42,
n_jobs=-1
)
# Train on resampled data
rf.fit(X_train_resampled, y_train_resampled)
# Make predictions on original test set
y_pred = rf.predict(X_test)
y_pred_proba = rf.predict_proba(X_test)[:, 1]
# Print classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, zero_division=1))
# Calculate and print confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
# Plot ROC curves
plt.figure(figsize=(10, 8))
colors = cycle(['blue', 'red']) # Only need two colors for two classes
# Get the unique classes
unique_classes = np.unique(y_test)
# Plot ROC curve for each class
for i, color in zip(unique_classes, colors):
fpr, tpr, _ = roc_curve(y_test == i, y_pred_proba)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, color=color, lw=2,
label=f'ROC curve class {i} (AUC = {roc_auc:.2f})')
# Add diagonal line
plt.plot([0, 1], [0, 1], 'k--', lw=2)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves')
plt.legend(loc="lower right")
plt.show()
Classification Report:
precision recall f1-score support
0 1.00 0.99 1.00 1246774
1 0.71 0.94 0.81 23558
accuracy 0.99 1270332
macro avg 0.85 0.97 0.90 1270332
weighted avg 0.99 0.99 0.99 1270332
Retraining Model on the Full Dataset¶
Here, the training and test sets are combined and SMOTE is applied again. A new Random Forest model (rf_full) is trained on the full resampled dataset. This step aims to produce a more robust model that can better generalize when predicting flood probabilities over the entire area.
# Combine training and test sets back into full dataset
X_full = X
y_full = y
# Handle class imbalance with SMOTE on the full dataset
smote_full = SMOTE(sampling_strategy='auto', random_state=42)
X_full_resampled, y_full_resampled = smote_full.fit_resample(X_full, y_full)
# Create and train a new Random Forest on the resampled full dataset
rf_full = RandomForestClassifier(
n_estimators=100,
random_state=42,
n_jobs=-1
)
rf_full.fit(X_full_resampled, y_full_resampled)
RandomForestClassifier(n_jobs=-1, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(n_jobs=-1, random_state=42)
# Predict flood probabilities for the entire dataset in batches to manage memory
all_probabilities = []
batch_size = 1_000_000
for start in range(0, X_full.shape[0], batch_size):
end = min(start + batch_size, X_full.shape[0])
X_batch = X_full[start:end]
# Get the probability of the positive class (flood)
y_proba_batch = rf_full.predict_proba(X_batch)[:, 1]
all_probabilities.extend(y_proba_batch)
print(f"Processed batch {start} to {end}")
# Convert to NumPy array
all_probabilities = np.array(all_probabilities)
# Save all probabilities to a compressed file
np.savez_compressed("predictions_proba.npz", y_proba=all_probabilities)
Processed batch 0 to 1000000 Processed batch 1000000 to 2000000 Processed batch 2000000 to 3000000 Processed batch 3000000 to 4000000 Processed batch 4000000 to 4234438
Generating and Visualizing the Final Flood Probability Map¶
The previously saved flood probabilities are loaded and reshaped into a 2D map. The map is then reprojected, aligned with the permanent water mask, and upscaled for better visualization. A final plot overlays permanent water areas on top of the probabilistic flood map, providing a comprehensive flood detection product.
# Load the saved flood probabilities
data = np.load("predictions_proba.npz")
y_proba = data["y_proba"]
# Define the shape of the subsampled Sentinel-1 data
original_shape = sentinel_data.shape # (height, width)
# Reshape probabilities to match the 2D flood map
flood_map = y_proba.reshape(original_shape)
# Define rotation parameters
rotation_angle = 11
theta = math.radians(rotation_angle)
center_lon = (7.91023875720884 + 10.45675839161536) / 2
center_lat = (7.125973782693571 + 9.09048554891996) / 2
flood_transform = Affine.translation(center_lon, center_lat) * \
Affine.rotation(rotation_angle) * \
Affine.translation(-center_lon, -center_lat) * \
from_bounds(
7.91023875720884 + 0.15,
7.125973782693571 + 0.20,
10.45675839161536 - 0.12,
9.09048554891996 - 0.20,
flood_map.shape[1],
flood_map.shape[0]
)
flood_crs = CRS.from_epsg(4326)
# Reproject the permanent water mask to match the flood map
resampled_water = np.empty(flood_map.shape, dtype=np.uint8)
reproject(
source=permanent_water_mask,
destination=resampled_water,
src_transform=water_transform,
src_crs=water_crs,
dst_transform=flood_transform,
dst_crs=flood_crs,
resampling=Resampling.nearest
)
# Upscale both maps to 1000x1000
flood_map_upscaled = resize(
flood_map,
(1000, 1000),
order=1,
anti_aliasing=True,
preserve_range=True
).astype(np.float32)
resampled_water_upscaled = resize(
resampled_water,
(1000, 1000),
order=0,
anti_aliasing=False,
preserve_range=True
).astype(np.uint8)
# Define geographic bounds
sentinel_geo_bounds = (
7.91023875720884, # min_lon
7.125973782693571, # min_lat
10.45675839161536, # max_lon
9.09048554891996 # max_lat
)
# Create the plot
plt.figure(figsize=(12, 10))
flood_plot = plt.imshow(
flood_map_upscaled,
cmap='Reds',
alpha=0.7,
extent=[
sentinel_geo_bounds[0], sentinel_geo_bounds[2],
sentinel_geo_bounds[1], sentinel_geo_bounds[3]
]
)
water_colors = [(0, 0, 0, 0), (0, 0.5, 1, 0.9)]
water_cmap = matplotlib.colors.LinearSegmentedColormap.from_list('custom_blue', water_colors)
water_plot = plt.imshow(
np.flipud(resampled_water_upscaled),
cmap=water_cmap,
extent=[
sentinel_geo_bounds[0], sentinel_geo_bounds[2],
sentinel_geo_bounds[1], sentinel_geo_bounds[3]
],
zorder=2
)
cbar = plt.colorbar(flood_plot)
cbar.set_label('Flood Probability', fontsize=12)
legend_elements = [Patch(facecolor=(0, 0.5, 1, 0.9), label='Permanent Water')]
plt.legend(handles=legend_elements, loc='upper right')
plt.title('Flood Detection Map with Permanent Water Overlay', fontsize=14)
plt.xlabel('Longitude', fontsize=12)
plt.ylabel('Latitude', fontsize=12)
plt.tight_layout()
plt.show()
Applying the Methodology to April 2018 Data¶
This cell replicates the entire workflow for a different time period (April 2018). It:
- Loads and preprocesses Sentinel-1 data for April 2018.
- Aligns and reprojects the permanent water mask and CHIRPS data from April 2018.
- Creates the feature matrix and labels.
- Trains (or uses an existing) Random Forest model.
- Uses batch prediction to produce a flood probability map for April 2018.
- Finally, it visualizes the April 2018 flood probability map with permanent water overlay for temporal comparison.
sentinel_2018_path = r'S:\471-688T\Fall24\uokoron1\S1A_IW_GRDH_1SDV_20180408T173721_20180408T173746_021379_024CD9_5813.SAFE\measurement\s1a-iw-grd-vv-20180408t173721-20180408t173746-021379-024cd9-001.tiff'
chirps_2018_path = r'S:\471-688T\Fall24\uokoron1\CHIRPS_Total_Rainfall_April2018.tif'
water_mask_path = r'S:\471-688T\Fall24\uokoron1\Copy of Benue_River_Occurrence.tif'
# Geographic bounds and rotation angle
sentinel_geo_bounds = (
7.91023875720884, # min_lon
7.125973782693571, # min_lat
10.45675839161536, # max_lon
9.09048554891996 # max_lat
)
rotation_angle = 11
theta = math.radians(rotation_angle)
center_lon = (7.91023875720884 + 10.45675839161536) / 2
center_lat = (7.125973782693571 + 9.09048554891996) / 2
flood_crs = CRS.from_epsg(4326)
with rasterio.open(sentinel_2018_path) as src:
vv_band_2018 = src.read(1).astype(np.float32)
sentinel_transform_2018 = from_bounds(
sentinel_geo_bounds[0], sentinel_geo_bounds[1],
sentinel_geo_bounds[2], sentinel_geo_bounds[3],
src.width, src.height
)
sentinel_shape_2018 = vv_band_2018.shape
# Convert to sigma0 in dB
sigma0_2018 = 10 * np.log10((vv_band_2018**2) + 1e-6)
# Apply median filter
filtered_vv_2018 = median_filter(sigma0_2018, size=3)
# Threshold for flood mask
thresh_2018 = 35
flood_mask_2018 = filtered_vv_2018 < thresh_2018
with rasterio.open(water_mask_path) as src:
permanent_water_mask = src.read(1)
water_transform = src.transform
water_crs = src.crs
# Create transform with rotation for flood data
flood_transform_2018 = Affine.translation(center_lon, center_lat) * \
Affine.rotation(rotation_angle) * \
Affine.translation(-center_lon, -center_lat) * \
from_bounds(
7.91023875720884 + 0.15,
7.125973782693571 + 0.20,
10.45675839161536 - 0.12,
9.09048554891996 - 0.20,
sentinel_shape_2018[1],
sentinel_shape_2018[0]
)
# Reproject permanent water to Sentinel-1 grid
resampled_water_2018 = np.empty(sentinel_shape_2018, dtype=np.uint8)
reproject(
source=permanent_water_mask,
destination=resampled_water_2018,
src_transform=water_transform,
src_crs=water_crs,
dst_transform=flood_transform_2018,
dst_crs=flood_crs,
resampling=Resampling.nearest
)
# Exclude permanent water from flood mask
water_bool_2018 = resampled_water_2018 > 0
flood_bool_2018 = flood_mask_2018.astype(bool)
flood_only_mask_2018 = np.logical_and(flood_bool_2018, ~water_bool_2018)
with rasterio.open(chirps_2018_path) as src_chirps:
chirps_data_2018 = src_chirps.read(1).astype(np.float32)
chirps_transform_2018 = src_chirps.transform
chirps_crs_2018 = src_chirps.crs
chirps_resampled_2018 = np.empty(sentinel_shape_2018, dtype=np.float32)
reproject(
source=chirps_data_2018,
destination=chirps_resampled_2018,
src_transform=chirps_transform_2018,
src_crs=chirps_crs_2018,
dst_transform=sentinel_transform_2018,
dst_crs=flood_crs,
resampling=Resampling.bilinear
)
flood_labels_2018 = np.zeros(sentinel_shape_2018, dtype=np.uint8)
flood_labels_2018[resampled_water_2018 == 1] = 2 # Permanent water
flood_labels_2018[flood_only_mask_2018] = 1 # Flooded areas
subsample_factor = 10
sentinel_data_sub_2018 = vv_band_2018[::subsample_factor, ::subsample_factor]
chirps_data_sub_2018 = chirps_resampled_2018[::subsample_factor, ::subsample_factor]
flood_labels_sub_2018 = flood_labels_2018[::subsample_factor, ::subsample_factor]
X_full_2018 = np.stack([sentinel_data_sub_2018.flatten(), chirps_data_sub_2018.flatten()], axis=1)
y_full_2018 = flood_labels_sub_2018.flatten()
valid_mask_2018 = ~np.isnan(X_full_2018).any(axis=1) & (y_full_2018 >= 0)
X_full_2018 = X_full_2018[valid_mask_2018]
y_full_2018 = y_full_2018[valid_mask_2018]
X_train_2018, X_test_2018, y_train_2018, y_test_2018 = train_test_split(X_full_2018, y_full_2018, test_size=0.3, random_state=42)
# Handle class imbalance
smote_2018 = SMOTE(sampling_strategy='auto', random_state=42)
X_train_resampled_2018, y_train_resampled_2018 = smote_2018.fit_resample(X_train_2018, y_train_2018)
# Train Random Forest
rf_full_2018 = RandomForestClassifier(
n_estimators=100,
random_state=42,
n_jobs=-1
)
rf_full_2018.fit(X_train_resampled_2018, y_train_resampled_2018)
y_pred_2018 = rf_full_2018.predict(X_test_2018)
print("April 2018 Classification Report:")
print(classification_report(y_test_2018, y_pred_2018, zero_division=1))
# Prepare the full feature set for April 2018 predictions
X_all_2018 = np.stack([vv_band_2018.flatten(), chirps_resampled_2018.flatten()], axis=1)
valid_mask_all_2018 = ~np.isnan(X_all_2018).any(axis=1)
y_proba_2018 = np.full(X_all_2018.shape[0], np.nan, dtype=np.float32)
# Process predictions in batches to avoid memory issues
batch_size = 1_000_000
for start in range(0, X_all_2018.shape[0], batch_size):
end = min(start + batch_size, X_all_2018.shape[0])
# Select the subset of valid pixels for this batch
valid_batch_mask = valid_mask_all_2018[start:end]
X_batch = X_all_2018[start:end][valid_batch_mask]
if X_batch.size == 0:
continue
y_proba_batch = rf_full_2018.predict_proba(X_batch)[:, 1]
# Place the predictions back into the appropriate positions
y_proba_2018[start:end][valid_batch_mask] = y_proba_batch
flood_map_2018 = y_proba_2018.reshape(sentinel_shape_2018)
flood_map_2018_upscaled = resize(
flood_map_2018,
(1000, 1000),
order=1,
anti_aliasing=True,
preserve_range=True
).astype(np.float32)
resampled_water_2018_upscaled = resize(
resampled_water_2018,
(1000, 1000),
order=0,
anti_aliasing=False,
preserve_range=True
).astype(np.uint8)
plt.figure(figsize=(12, 10))
flood_plot_2018 = plt.imshow(
flood_map_2018_upscaled,
cmap='Reds',
alpha=0.7,
extent=[
sentinel_geo_bounds[0], sentinel_geo_bounds[2],
sentinel_geo_bounds[1], sentinel_geo_bounds[3]
]
)
water_colors = [(0, 0, 0, 0), (0, 0.5, 1, 0.9)]
water_cmap = matplotlib.colors.LinearSegmentedColormap.from_list('custom_blue', water_colors)
water_plot_2018 = plt.imshow(
np.flipud(resampled_water_2018_upscaled),
cmap=water_cmap,
extent=[
sentinel_geo_bounds[0], sentinel_geo_bounds[2],
sentinel_geo_bounds[1], sentinel_geo_bounds[3]
],
zorder=2
)
cbar = plt.colorbar(flood_plot_2018)
cbar.set_label('Flood Probability', fontsize=12)
legend_elements = [Patch(facecolor=(0, 0.5, 1, 0.9), label='Permanent Water')]
plt.legend(handles=legend_elements, loc='upper right')
plt.title('April 2018 Flood Detection Map with Permanent Water Overlay', fontsize=14)
plt.xlabel('Longitude', fontsize=12)
plt.ylabel('Latitude', fontsize=12)
plt.tight_layout()
plt.show()
April 2018 Classification Report:
precision recall f1-score support
0 1.00 0.98 0.99 1259684
1 0.29 0.91 0.44 12161
accuracy 0.98 1271845
macro avg 0.65 0.94 0.72 1271845
weighted avg 0.99 0.98 0.98 1271845
Comparing Flood Probability Between Two Time Periods¶
In this section, we assume that we have two flood probability maps from different time periods (e.g., October 2022 and April 2018), both aligned and resampled to the same spatial resolution and extent. By subtracting the April 2018 flood probability map from the October 2022 map, we can visualize changes in flood likelihood over time.
What the code does:
- Computes the difference in flood probability for each pixel.
- Displays the result as a map, where positive values indicate an increase in flood probability from 2018 to 2022, while negative values show a decrease.
Interpretation:
- Areas shaded in red represent regions that have become more prone to flooding over time and as you can see in the actual results it only shows red where there is permanent water.
- Areas shaded in blue represent regions that were more prone to flooding in the past but are now less likely to be flooded.
- White or light shades near zero indicate minimal change.
flood_difference = flood_map_upscaled - np.flipud(resampled_water_2018_upscaled)
plt.figure(figsize=(10, 8))
plt.imshow(flood_difference, cmap='RdBu', vmin=-1, vmax=1)
plt.colorbar(label="Difference in Flood Probability (2022 - 2018)")
plt.title("Change in Flood Probability Over Time")
plt.xlabel("Pixel X")
plt.ylabel("Pixel Y")
plt.show()
Results¶
Mapped Products¶
Flood Mask (Binary):
A preliminary classification distinguishing flooded and non-flooded areas for both April 2018 and October 2022.Flood Probability Map:
A probabilistic flood likelihood map for each time period, providing a gradient of flood risk levels based on the Random Forest classifier outputs.
Observations¶
- Higher rainfall intensity (as indicated by CHIRPS) generally corresponds to increased flood probability in the Sentinel-1-derived maps.
- Batch processing for
predict_probaallowed handling large datasets without memory errors, enabling full spatial coverage for both historical (April 2018) and more recent (October 2022) events.
Discussion & Conclusions¶
Key Findings¶
- Integrating Sentinel-1 SAR data with CHIRPS rainfall data enhances flood detection and mapping capabilities.
- The methodology is reproducible enough to be applied to different time periods, demonstrated by the April 2018 and October 2022 comparisons.
Unexpected Results¶
- Certain regions showed discrepancies between expected flood patterns (based on rainfall) and SAR-detected flooding, suggesting data quality issues.
Strengths and Limitations¶
Strengths:¶
- All-weather flood detection from SAR data.
- Contextual understanding through rainfall data integration.
- Successful scaling of prediction steps using batching to overcome memory constraints.
Limitations:¶
- Dependence on thresholding, which might require periodic recalibration.
- CHIRPS data resolution may not capture rainfall variations.
- Memory constraints necessitated batch processing solutions.
Future Work¶
- Experiment with alternative machine learning or deep learning models for improved accuracy.
- Explore advanced memory managementtechniques to handle even larger datasets or multiple time periods efficiently.
Model Performance Evaluation¶
Classification and Model Performance¶
October 2022 Results¶
- Confusion Matrix: The majority of non-flooded pixels (class 0) were correctly identified, with minimal misclassification. Flooded pixels (class 1) were also well-detected, though a small fraction were missed.
- Classification Report:
- Overall Accuracy: ~99%
- Class 0 (Non-Flooded):
- Class 1 (Flooded):
- Precision: 0.71
- Recall: 0.94
- F1-Score: 0.81
- Interpretation:
The model accurately discriminates flooded from non-flooded areas, with minimal misclassification. It generalizes well for October 2022 conditions.
April 2018 Results¶
- Confusion Matrix & Classification Report:
- Overall Accuracy: ~98%
- Class 0 (Non-Flooded):
- Class 1 (Flooded):
- Precision: 0.29
- Recall: 0.91
- F1-Score: 0.44
- Interpretation:
The model’s weaker precision for the flooded class suggests challenging conditions in April 2018. High false positive rates indicate over-prediction of flooding, necessitating additional refinement or calibration.
Probabilistic Flood Maps¶
October 2022:
The flood probability map aligns closely with known flooding patterns, highlighting water-logged regions along river channels and floodplains. The permanent water overlay distinguishes stable water bodies from newly flooded areas.April 2018:
Despite similar methodology, lower precision for the flooded class is clear. Certain regions exhibit higher flood probabilities that may not correspond to actual floodin which reflects the variability in flood conditions.
Summary of Findings¶
- High Model Accuracy:
The model achieves over 98% accuracy, which demonstrates clear detection for non-flooded areas. - Temporal Variability in Performance:
Also the October 2022 results show reliable performance for both classes, while April 2018 results highlight the need for some refinement. - Utility of Probabilistic Maps:
The probabilistic approach provides valuable spatial insights. For October 2022, it supports risk identification and planning, while April 2018 highlights areas for improvement.
Appendix¶
Jupyter Notebook Steps Summary¶
Preprocessing Sentinel-1 Data:
- DN to Sigma0 conversion
- Speckle filtering
- Flood thresholding and permanent water masking
Resampling CHIRPS Data:
- Aligning resolution and extent with Sentinel-1 for each study period.
Random Forest-based Flood Detection and Mapping:
- Handling class imbalance with SMOTE
- Batch-wise probability predictions to manage memory usage
- Comparing outputs between April 2018 and October 2022